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ABSTRACT 


2.1.  Magnetic  data  model 


We  present  a  general  approach  for  multi-modal  sensor  fusion 
based  on  nonparametric  probability  density  estimation  and 
maximization  of  a  mutual  information  criterion.  We  apply  this 
approach  to  fusion  of  vector-magnetic  and  acoustic  data  for 
classification  of  vehicles.  Linear  features  are  used,  although  the 
approach  may  be  applied  more  generally  with  other  sensor 
modalities,  nonlinear  features,  and  other  classification  targets.  For 
the  magnetic  data,  we  present  a  parametric  model  with 
computationally  efficient  parameter  estimation.  Experimental 
results  are  provided  illustrating  the  effectiveness  of  a  classifier  that 
discriminates  between  cars  and  sport  utility  vehicles. 

Index  Terms —  sensor  network,  classification,  sensor  fusion, 
mutual  information. 


1.  INTRODUCTION 

Multimodal  sensor  networks  are  deployed  in  many  scenarios 
where  each  node  contains  several  sensor  modalities,  such  as 
acoustic,  magnetic,  seismic,  radar,  electrostatic,  infrared,  optical, 
and  others.  We  focus  on  fusing  two  modalities,  acoustic  and 
magnetic,  for  the  purpose  of  classifying  civilian  vehicles  such  as 
cars,  sport  utility  vehicles  (SUVs),  and  trucks.  In  this  work,  the 
magnetic  sensor  is  a  vector  magnetometer  and  the  acoustic  sensor 
is  a  single  microphone,  and  the  vehicle  is  moving  along  a  road  so 
that  the  range  to  the  sensors  is  known  approximately. 

For  a  magnetic  source  moving  with  constant  velocity,  a  model 
for  the  vector  magnetometer  output  signal  is  available  based  on 
linear  combinations  of  Anderson  functions.  We  use  this  model  to 
estimate  the  source  speed  and  reduce  the  vector  magnetic  data  to  9 
parameters.  A  corresponding  parametric  model  is  not  available  for 
the  acoustic  signal  from  civilian  vehicles,  and  models  are  not 
known  for  the  joint  statistical  dependence  between  the  magnetic 
and  acoustic  signals.  We  address  this  by  using  nonparametric 
probability  density  estimation  to  learn  the  joint  statistics  from 
training  data,  and  then  the  magnetic-acoustic  data  is  fused  by 
extracting  features  for  classification  that  maximize  an  information- 
theoretic  criterion.  We  apply  the  approach  with  measured  data 
from  civilian  vehicles,  demonstrating  that  fusion  of  magnetic  and 
acoustic  data  using  the  information-theoretic  criterion  improves 
the  ability  to  discriminate  between  cars  and  SUVs. 

2.  VECTOR  MAGNETIC  SENSOR  MODEL  AND 
PARAMETER  ESTIMATION 

We  begin  with  a  review  of  a  parametric  model  for  the  vector 
magnetic  field  observed  at  a  sensor  in  a  time  interval  around  the 
closest  point  of  approach  (CPA)  of  the  source,  then  we  present  a 
computationally-efficient  algorithm  for  estimating  the  parameters. 


We  assume  that  a  magnetic  dipole  with  moment  vector  m  passes 
CPA  at  time  t  =  0 ,  as  illustrated  in  Figure  1.  The  CPA  range  is 
denoted  by  RCPA  ,  the  source  velocity  vector  is  v ,  and  the  speed 
is  v .  Then  each  component  of  the  vector  magnetic  field  can  be 
expressed  as  a  linear  combination  of  the  Anderson  functions  [1] 


n  =  0,1,2  , 


(1) 


where  the  time-scale  a  is  related  to  the  speed  and  RCVA  as 


The  magnetic  field  components  vary  with  time  according  to  [1] 

B(0=W0  By(t)  Bz (t)j  =  F(« t)C  +  e(t),  (3) 

where  F(«  /)  is  a  1x3  vector  of  Anderson  functions, 

F(at)  =[/„(«?)  f(at)  ,/2  («?)],  (4) 

C  is  a  3x3  matrix  of  coefficients,  and  e(t)  is  a  1x3  vector  that 
accounts  for  deviations  from  the  ideal  model  due  to  noise,  sensor 
motion,  extraneous  magnetic  sources,  and  other  effects. 

The  model  can  be  further  elaborated  to  relate  the  elements  of 
C  to  the  magnetic  dipole  moment  vector  m  ,  the  direction  of 
motion  v/v,  and  the  CPA  range  [1],  We  do  not  include  these 
detailed  (and  nonlinear)  relationships  here,  but  we  note  the 
following  characteristics  of  C  from  the  detailed  model  [1]: 

1.  The  C  matrix  is  scaled  by  a  factor  that  is  proportional  to 
(rn/ Rp PA),  where  m  is  the  magnetic  dipole  moment 
magnitude.  Therefore  the  elements  of  C  have  magnitude 
that  is  proportional  to  m  and  decays  rapidly  with  range. 

2.  The  variations  from  element  to  element  in  C  depend  on  the 
orientations  of  the  magnetic  dipole  vector  m  /  m  and  the 
direction  of  motion  v  /  v  . 

3.  The  source  speed  v  enters  the  model  in  (3)  only  through  the 
time-scale  parameter  a  in  (2). 

These  observations  imply  that  the  source  speed  can  be  estimated 
from  a  if  the  CPA  range  is  known.  Also,  C  characterizes  the 
magnetic  dipole  moment  vector  of  the  source  if  the  sensor  is 
placed  near  a  road,  because  then  the  direction  of  motion  v/v  is 
fixed  (except  for  a  sign  difference  for  left-to-right  and  right-to-left 
motion)  and  the  CPA  range  is  approximately  known  (within  the 
width  of  the  road).  However,  the  C  matrix  will  be  different  for 
left-to-right  and  right-to-left  motion.  Therefore  we  use  C  to 
summarize  the  magnetic  properties  of  the  source  for  classification. 
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2.2.  Parameter  estimation 


The  model  in  (3)  contains  10  parameters  in  a  and  C  ,  where 
B(7)  represents  the  measured  vector  magnetic  sensor  data.  The 
parameters  may  be  estimated  by  minimizing  the  squared-error, 

s2{a, C)  =  Tr  {j[B(f)-F(af)cf  [fi(t)-F(at)c]<*}  (5) 

where  superscript  T  denotes  the  transpose  operation,  Tr  is  the  trace 
of  the  matrix,  and  the  integration  limits  are  from  — oo  to  oo  .  For 
fixed  a  ,  the  least-squares  estimate  of  C  is 

c| a  =  argmin^2(«,C)=  G(a)  Jf(« B(/) t// ,  (6) 

where  the  3x3  matrix  G(a)  can  be  expressed  in  closed-form  as 

r 3  o  -5" 

G(a)=  [|f(«  tj  F(a  /)  d^'  =  0  16  0  .  (7) 

-5  0  35 

Next,  to  find  the  least-squares  estimate  for  a  ,  we  substitute  (6) 
into  (5)  to  eliminate  C  ,  leading  to 

a  =  argmax  Tr  |  JJb(x)  [F(«x)G(«)F(ai)7  ]B(f)ofr  £*j.(8) 

a 

An  interpretation  of  (8)  is  that  CC  is  chosen  to  maximize  the  total 
energy  in  the  orthogonal  projections  of  the  components  of  B(t) 
onto  the  subspace  spanned  by  the  Anderson  functions.  The 
quantity  inside  the  square  brackets  in  (8)  is  a  scalar  function  that 
can  be  evaluated  as 

K(s,t;a)  =  F(a.j)G(eir)F(eir/)r 

8|«r|  35(ors)2(ai)2 -5(cw)2 -5(etf)2  +  16(ck)(cit)-i-3  (9) 


so  (8)  may  be  expressed  more  directly  as 

a  =  arg  max  I  1PA  t;a)  B(s,t)  ds  dt\  (10) 

a 

where 

B{s,t)=  Bx{s)Bx{t)+By{s)By{t)+Bz{s)Bz{t).  (11) 

The  operation  in  (10)  may  be  viewed  as  a  vector  matched-filter  on 
B(/)  to  estimate  a  =  v/ 7?CPA  . 

In  summary,  the  global  solution  to  the  least-squares 
minimization  in  (5)  is  obtained  by  first  solving  (10)  for  a  ,  and 

then  using  a  in  (6)  to  find  C  .  The  continuous-time  formulation 
facilitates  the  evaluation  of  the  closed-form  expressions  in  (7)  and 
(9).  In  practice,  sampled  data  is  used,  so  B(t)  is  replaced  by  an 

Ax 3  matrix,  B^  =[BT,BV,B_],  containing  A  samples  of  each 
vector  magnetic  sensor  data  component  with  spacing  Ts  sec 
between  samples.  Then  the  model  in  (3)  becomes 


BA=FiV(«)C  +  ejV,  (12) 

where  the  Ax 3  matrix  FA,(a)  contains  samples  of  the  Anderson 
functions  in  (1).  We  define  the  3x3  matrix 

GA(«)=[r5FA(«)rFjV(a)[‘*G(«)  (13) 

and  the  AxA  matrix 

Kv(«)=Fv(«)GiV(a)FAr(a)7'  ~K(a)  (14) 

where  K(«)  is  an  AxA  matrix  obtained  by  sampling  the  function 
in  (9).  The  approximations  in  (13)  and  (14)  become  exact  as  the 
sample  spacing  Tj,  — >  0  and  the  processing  time  interval  — >  oo 
The  approximations  reduce  computations  by  eliminating  the 
matrix  products  and  inverse  in  (13)  and  (14).  The  least-squares 

estimates  for  CC  and  C  with  discrete-time  data  are  then 

a  =  argmax  {b^  Kw(a)Bv  +B^  K;V(a)Bv +  B’'  Ka(«)bJ 

“  r  ,  (15) 

*  argmax  {b^  K(a)Bx  +  Bj;  K(a)Bv+B:  K(«)bJ 

a 

and 

C  =  TS  GN{d)FN{a)TBN*Ts  G(«)Fv(«)rB.v  (16) 

where  the  approximations  in  (13)  and  (14)  are  used  to  reduce 
computation.  The  model-based  estimate  is  then 

BjV=Fiv(«)C  (17) 

and  B^  can  be  compared  with  the  data  B^  to  assess  the  fit  of  the 
model  to  the  data.  Figure  2  shows  a  good  fit  between  the  model 
and  measured  data  for  a  car  traveling  at  15  mph  with  CPA  range 
19  ft.  The  estimated  speed  is  15.2  mph,  which  agrees  closely  with 
the  ground  truth.  Similar  fits  to  the  source  speed  and  model  were 
obtained  for  25  different  vehicles  in  the  measured  data  set. 

3.  JOINT  MAGNETIC-ACOUSTIC  FEATURES  THAT 
MAXIMIZE  MUTUAU  INFORMATION 

In  this  section,  we  consider  classification  of  vehicles  by  jointly 
processing  vector  magnetic  field  data  measured  with  a 
magnetometer  and  single-channel  acoustic  data  measured  with  a 
microphone.  The  steps  in  our  approach  for  linear  feature 
extraction  are  described  first,  followed  by  an  algorithm  for  finding 
features  that  maximize  a  mutual  information  criterion. 

3.1.  Procedure  for  linear  feature  extraction 

(1)  Estimate  the  CPA  time  using  the  peak  of  the  total  field, 

*cpa  =  ar8  max  B(0  =  ar8  max  [B,  if)2  +  By  (/)2  +  Bz  (/)2  ] 1/2  . 

t  t 

Take  a  window  of  vector  magnetic  field  samples  B^  and  acoustic 


Magnetic : 

B(0  =  M0  Byit) 

Acoustic : 

*a  (t) 


Figure  1:  Illustration  of  acoustic-magnetic  source  moving  with  constant  velocity  near  a  vector  magnetometer  and  microphone. 
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MAGNETIC  WITH  ANDERSON  APPROXIMATION 


Figure  2:  Comparison  of  the  fit  between  measured  vector- 
magnetic  data  and  Anderson  function  model  ( 1 5)-(  1 7)  for  a  car. 


samples  X  4  with  CPA  at  the  center  of  the  window,  and  redefine 
the  time  axis  for  the  samples  so  that  t  =  0  at  CPA. 

(2)  Process  the  window  of  vector  magnetic  field  data  as  in 


(15)  and  (16)  to  estimate  the  model  parameters  a  and  C  .  The 

columns  of  C  are  stacked  into  a  9x1  vector  XMag  that  represents 

the  magnetic  data.  The  source  speed  may  be  estimated  from  a 
using  (2)  if  the  CPA  range  is  known. 

(3)  The  window  of  acoustic  samples  is  placed  into  a  vector  \A 
with  N A  x  1  samples.  Parametric  models  are  not  available  for  the 
acoustic  signals  corresponding  to  civilian  vehicles  at  CPA,  so  it  is 
not  obvious  how  the  acoustic  data  may  be  reduced  to  a  few 
parameters  that  are  analogous  to  XMag  for  the  magnetic  field  data. 


(4)  The  XMag  and  X^  vectors  are  stacked  into  a  joint  magnetic- 


acoustic  vector, 


We  focus  on  magnetic  and  acoustic 


data  in  this  paper,  but  other  sensor  modalities  may  be  included. 

(5)  The  magnetic-acoustic  data  in  X  is  processed  by  a  linear 
transformation  to  extract  a  low-dimensional  feature  vector  Y  with 
dimension  NY  x  1 , 


Y  =  ArX  (18) 

where  A  is  a  matrix  with  dimension  (9  +  NA)x  NY . 
Information-theoretic  criteria  for  choosing  A  to  maximize  the 
classification  information  in  Y  are  described  in  Section  3.2. 


3.2.  Maximum  mutual  information  (MMI)  features 

We  begin  with  a  review  of  several  desirable  properties  of  features 
that  maximize  a  mutual  information  (MMI)  criterion.  Then  we 
review  a  particular  algorithm  [2]  for  extracting  MMI  features  that 
uses  nonparametric  probability  density  function  (pdf)  estimation  to 
learn  the  joint  statistical  dependence  between  the  magnetic  and 
acoustic  measurements  from  the  training  data. 

The  dimensionality-reducing  feature  extraction  processing  in 
(18)  is  not  strictly  necessary  for  classification,  since  the  classifier 
can  operate  directly  on  the  higher-dimensional  data  vector,  X. 
However,  with  small  training  sets,  classifiers  often  generalize 
better  when  they  are  trained  with  low-dimensional  features  that 


retain  the  information  for  classification.  In  addition,  features 
derived  from  information-theoretic  criteria  have  recently  been 
found  [4]  to  achieve  lower  classification  error  on  several  data  sets 
than  systems  that  jointly  derive  the  features  and  the  classifier,  such 
as  [5],  An  advantage  of  designing  the  features  independently  from 
the  classifier  is  that  the  features  may  be  applied  subsequently  to 
any  of  a  large  number  of  classifiers  [3], 

A  theoretical  basis  for  using  mutual  information  for  feature 
extraction  is  provided  by  bounds  on  the  probability  of 
classification  error,  Pe.  Suppose  that  S  is  a  discrete  random 
variable  with  alphabet  {1,  2,  ...,  M]  representing  the  labels  of  M 
classes.  The  mutual  information  (MI)  between  the  feature  vector 
Y  and  the  class  label  S  is  denoted  by  7(S,  Y)  .  It  has  been  shown 

that  Pe  is  bounded  above  [6]  and  below  [7]  by  functions  of  the  MI, 
where  the  bounds  are  decreasing  functions  of  the  MI.  Therefore 
maximizing  the  MI  in  the  features  minimizes  the  bounds  on  Pe. 
The  upper  and  lower  bounds  in  [6,7]  are  stated  in  terms  of 
Shannon  mutual  information,  but  the  bounds  have  recently  been 
extended  to  Renyi  mutual  information  in  [8],  The  MMI  algorithm 
that  we  use  from  [2]  maximizes  a  form  of  Renyi  mutual 
information.  Further  justification  for  mutual  information  as  a 
metric  for  feature  extraction  has  recently  been  presented  in  [9], 
where  several  commonly  used  linear  feature  extraction  methods 
are  formulated  in  a  unified  information-theoretic  framework. 

The  desirable  properties  of  MMI  features  have  been  known 
for  some  time,  but  computational  difficulties  have  prevented 
widespread  use  until  recently.  As  above,  let  S  be  a  discrete 
random  variable  with  alphabet  {1,  2,  ...,  M\  that  represents  the 
class  label,  Y  is  the  feature  vector,  fs{i)  is  the  a  priori  probability 

that  S  =  i,  /Y!S(y  |  i)  is  the  probability  distribution  for  the  features 

in  class  i,  and  /Y( y)=  fv ,s (y  |  i)  fs(i )  is  the  distribution  of 

all  the  classes.  The  definitions  of  Renyi  entropy  with  order  a  and 
Shannon  entropy  for  a  random  vector  X  with  probability 
distribution  fx  (x)  are  [11] 

Renyi :  Ha  (x)  =  — ' —  log  E  x  {/x  (x)"_1 }  (19) 

1  -a 

Shannon  :  //(x)  =  -Ex  {log /x (x)} ,  (20) 

where  a  >  0,  a  &  1 ,  and  lim//a(x)=  7/(x).  We  will  use 

a—>  1 

Renyi’s  quadratic  entropy  with  a  —  2  . 

The  Shannon  and  Renyi  entropies  can  be  used  to  define  the 
classical  Shannon  MI  and  Renyi’s  quadratic  MI,  and  both  of  these 
MI  forms  have  been  used  for  feature  extraction.  In  [4],  Renyi’s 
quadratic  MI  is  used  and  combined  with  the  stochastic  information 
gradient  from  [12]  to  reduce  complexity.  In  [13],  Shannon  MI  is 
used  to  extract  nonlinear  MMI  features  via  the  “kernel  induced 
feature  space”  (KIFS).  Other  recent  works  [14,  15]  have  connected 
information-theoretic  learning  with  kernel  methods. 

A  different  form  of  quadratic  MI  is  defined  in  [2]  for  the 
purpose  of  extracting  MMI  features.  The  quadratic  mutual 
information  in  [2]  is  motivated  from  a  quadratic  divergence 
measure  between  fYS  (y,  i)  =  /Y|s  (y  |  i)  fs  (/)  and  /Y  (y)  fs  (;') , 

/r(Y;s)  =  I|[/vi(y,i)-/v(y)/s(i)f  dy 

'= 1  l/1) 

=  Z/s(02[|/Yis(y|i)2  ^  +  |yY(y)2  ^-2{/V|s(y|')/Y(y)^] 

7=1 
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This  same  divergence  measure  is  proposed  and  studied  in  [14,  15], 
Nonparametric  Parzen  windows  may  be  used  to  estimate  the 
probability  distributions  in  (21).  The  Parzen  window  method 
places  a  “kernel  function”  around  each  training  sample  and  adds 
the  kernels  to  yield  a  continuous  function.  We  will  use  a  Gaussian 

kernel  function  with  dimension  Ny  and  diagonal  covariance  cj~  I , 

Ga 2  (y)=  (2zrcr2)  exp^--^-jyryj  .  (22) 

As  in  [2],  the  following  notation  is  used  for  the  features 
Y  =  ArX  corresponding  to  the  training  data.  Let  71  denote  the 
number  of  training  samples  for  class  i,  for  i  =  ,  and  let 


T  =  7j  H - 1-  Tm  be  the  total  number  of  training  samples.  The 

features  corresponding  to  the  training  data  for  class  i  are  denoted 


by  y\  for  t  =  \,...,Ti  for  i  =  The  superscript  is  the 


class  label  and  the  subscript  is  the  index  of  the  vector  within  the 
class.  When  the  class  label  is  not  important,  the  training  samples 
are  indexed  with  a  single  subscript,  y( ,  for  t  =  1, . . . ,  T  . 


Using  (22)  to  estimate  the  pdfs  in  (21)  yields  the  MMI 
criterion  in  [2], 

1  M  Tj  Tj  ~  M  rr  T  Tj 

4(Y;S)  =  ^XXXGjy;-yL)-^XXG>,-y[) 


(23) 


l 


m 


SXG^(y,  -y») 


We  maximize  (23)  with  respect  to  the  matrix  A  to  obtain  the  MMI 

features,  with  the  constraint  A7  A  =  I .  The  maximization  of 
quadratic  MI  can  also  be  applied  with  nonlinear  feature  mappings 
Y  =  g(X;T)  ,  where  T  is  a  parameter  vector. 


4.  RESULTS  USING  MEASURED  DATA 

We  have  applied  the  information-theoretic  fusion  of  magnetic  and 
acoustic  data  with  measured  data  to  classify  vehicles.  The 
experiments  consisted  of  cars,  SUVs,  and  trucks  traveling  along  a 
road  in  both  directions,  left-to-right  (L2R)  and  right-to-left  (R2L), 
with  CPA  ranges  from  19  to  28  ft,  and  at  speeds  of  15  mph  and  25 
mph.  The  data  set  was  too  small  to  estimate  the  probability  of 
classification  error,  so  we  examined  scatter  plots  of  three- 
dimensional  feature  vectors  to  evaluate  discrimination  between 
cars  and  SUVs.  We  observed  the  following  results  from 
processing  the  measured  data.  (1)  The  MMI  features  significantly 
improve  discrimination  compared  with  simple  Fisher’s  LDA  [3] 
features.  (2)  Fusion  of  magnetic  and  acoustic  data  allows 
discrimination,  but  using  magnetic  data  alone  does  not  allow 
discrimination.  (3)  Incorporation  of  simple  information  about  the 
vehicle’s  track  (speed  &  direction)  improves  feature  extraction  for 
classification.  The  magnetic  signatures  vary  with  the  vehicle’s 
direction  and  the  acoustic  signatures  vary  with  the  speed,  so 
features  that  are  matched  to  the  direction  &  speed  perform  better. 

Figure  3  contains  a  representative  result,  where  CAR-L  and 
CAR-R  are  training  data  for  (different)  cars  moving  L2R  and  R2L, 
respectively,  and  SUV-L  and  SUV-R  are  corresponding  training 
data  for  SUVs.  The  features  are  derived  separately  for  vehicles 
moving  L2R  and  R2L,  giving  rise  to  the  top  and  bottom  panels. 
The  LSUV  points  in  Figure  3  correspond  to  a  new  “light  SUV” 
that  is  different  than  the  SUVs  in  the  training  data,  where  LSUV-L 
and  LSUV-R  are  traveling  L2R  and  R2L,  respectively.  Note  that 


the  LSUV-L  is  closely  clustered  with  the  SUV-L  training  data  and 
LSUV-R  is  closely  clustered  with  SUV-R,  indicating 
discrimination  of  the  SUV  from  the  cars. 


MMI  -  MAGNETIC  &  ACOUSTIC,  L2R 


MMI  -  MAGNETIC  8,  ACOUSTIC,  R2L 


Figure  3:  MMI  features  for  new  (non-training)  magnetic-acoustic 
data  from  a  light  SUV  (LSUV)  moving  L2R  and  R2L  at  1 5  mph. 
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